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ABSTRACT 

Selection effects can bedevil the inference of the properties of a population of astronom¬ 
ical catalogues, unavoidably biasing the observed catalogue. This is particularly true when 
mapping interstellar extinction in three dimensions: more extinguished stars are fainter and 
so generally less likely to appear in any magnitude limited catalogue of observations. This 
paper demonstrates how to account for this selection effect when mapping extinction, so that 
accurate and unbiased estimates of the true extinction are obtained. We advocate couching the 
description of the problem explicitly as a Poisson point process, which allows the likelihoods 
employed to be easily and correctly normalised in such a way that accounts for the selection 
functions applied to construct the catalogue of observations. 
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1 INTRODUCTION 

A common task in astrophysics is to infer the characteristics of a 
population from observations of its members. However, generally 
we do not observe all members of the population. Moreover, those 
we do observe do not constitute a unbiased sample, but are selected 
in some systematic fashion. Normally we refer to objects as being 
subject to some ‘selection function’, which gives the probability of 
an object being observed. 

As we do not possess observations of all objects in the popula¬ 
tion, nor those of an unbiased sample, any inference we make about 
the population will potentially be biased if the selection function is 
not accounted for. This has been recognised for some considerable 
time: Malmquist & Hufnagel (1933) and Lutz & Kelker (1973) ex¬ 
amine two particularly well known and well studied biases which 
arise as a consequence of selection functions. The three dimen¬ 
sional mapping of extinction is subject to a similar effect (Neckel 
1966), that if untreated will lead a significant bias in the estimation 
of extinction. 

A well trodden approach to tackling these biases, as taken by 
Lutz & Kelker (1973), follows in the spirit of Eddington (1913), 
whereby one attempts to de-bias the results obtained using a cor¬ 
rection factor. However, the calculation of the correction factor re¬ 
lies on a solid understanding of the population, which is often not 
available, particularly as one might be investigating the relevant 
properties of the population. 

An alternative approach follows from Jeffreys (1938), who 
suggests that it is better to employ a forward modelling approach 
and predict what the data might look like, given the selection func¬ 
tion and some free parameters, and then compare to the data in 
hand. A Bayesian statistical approach is philosophically consistent 


with this paradigm, capable of including the selection function in 
the construction of a statistical model. 

This paper will focus on the impact of selection functions 
when mapping interstellar extinction in three dimensions, though 
in approaching this focus some more general issues are covered. 
Following this introduction, section 2 introduces the issue of selec¬ 
tion functions when mapping extinction, demonstrating how simple 
methods for mapping extinction can be hopelessly biased. This is 
followed by sections 3,4 and 5 that provide the required mathemat¬ 
ical and physical background to approach the problem of mapping 
extinction. They discuss how to model selection functions (sec¬ 
tion 3), introduce the Poisson point process (section 4) and present 
a discussion of how a point process formalism can be employed 
to describe astronomical catalogues, focusing on a treatment of se¬ 
lection functions (section 5). Then, section 6 demonstrates how the 
Poisson point process based methodology can be employed to ob¬ 
tain unbiased estimates of the extinction distance relationship in a 
given direction, whilst section 7 discusses how selection effects can 
also influence observations of the extinction law and how it varies. 
Finally, section 8 sums up. 


2 EXTINCTION MAPPING AND SELECTION 
EUNCTIONS 

In recent years there has been a growing industry in mapping inter¬ 
stellar extinction in three dimensions (e.g Marshall et al. 2006; Sale 
et al. 2009; Majewski, Zasowski & Nidever 2011; Berry et al. 2012; 
Hanson & Bailer-Jones 2014; Green et al. 2014; Lallement et al. 
2014; Schultheis et al. 2014; Sale et al. 2014; Sale & Magorrian 
2014). This has been motivated both by the fact that the extinction 
of starlight by interstellar dust has a significant confounding impact 
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Figure 1. A pseudo-photometric catalogue represented in distance- 
extinction space. Red crosses represent stars that appear in the catalogue, 
black points those that are too faint and so are not included in the cata¬ 
logue. The two black dashed lines delimit the regions where the catalogue 
is 100% complete (bellow the lower line) and 100% incomplete (above the 
higher line). It can be clearly seen that the effect of the faint magnitude limit 
is to preferentially exclude more extinguished stars. The mean distance- 
extinction relationship found by taking the mean of the extinctions of ob¬ 
served stars is distance bins (blue dashed line) can be compared to the true 
mean distance-extinction relationship (black solid line). Not only does the 
binned mean underestimate extinction, but it also unphysically drops with 
respect to distance. 

on observations, that considerably complicates the study of individ¬ 
ual stars and stellar populations, but also because extinction offers 
a direct route to studying the elusive three dimensional structure of 
the interstellar medium (ISM). 

Three dimensional mapping of the extinction in the Galaxy is 
most frequently performed using photometry, owing to its relative 
preponderance. As a result the data catalogues employed feature 
a selection function that is essentially dependent only on the sky 
position and apparent magnitude of the stars. Unfortunately, the 
effects of the selection function are particularly pathological: more 
extinguished stars are fainter and thus less likely to appear in an ob¬ 
served catalogue, therefore a naive analysis would tend to be biased 
towards lower extinctions. The existence of this problem has been 
recognised for some considerable time (e.g. Neckel 1966; Fitzger¬ 
ald 1968; Neckel & Klare 1980). 

At this point, it should also be noted that photometric cata¬ 
logues are typically also subject to a bright magnitude limit. This 
will act to exclude the most local and least extinguished stars from 
a catalogue. However, this regime is typically less well sampled 
due to the lower volume of space it affects. Thus, its effects are less 
noticeable and so, for pedagogical purposes this paper will concen¬ 
trate on the effects of the faint magnitude limit. However, note that 
in practice equal care should be taken with the bright magnitude 
limit as with the faint. 

In order to illustrate the effect of the bias caused by the faint 
magnitude limit, we will first consider a very simple example. A 
sample of stars was simulated, with each star assigned a distance, 
absolute magnitude and extinction. The details of the distance and 
absolute magnitude distributions are qualitatively unimportant, but 
follow the description given in appendix C. An extinction in the 
V band’ was simulated assuming extinction at a given distance to 
be lognormally distributed (following Ostriker, Stone & Gammie 

' For the sake of simplicity, it was assumed that Ay depends only on the 
properties of the simulated ISM and that it was independent of the assumed 


2001). The mean extinction at a distance v follows 

E[Av(i)] =Aniax(l-exp(-j/2kpc)), (1) 

where Amax was arbitrarily set to 5. This mimics observing towards 
the Galactic anticentre. Meanwhile, the standard deviation of ex¬ 
tinction was defined as 

o[Av(i)]=0.3VE[Ay(5)], (2) 

inspired by Fischera & Dopita (2004) and Sale et al. (2014). We 
further assumed that we posses perfect observations such that we 
know the true distance and extinctions, i.e. there are no observa¬ 
tional errors. 

In order to produce something like a real catalogue an apparent 
magnitude based selection function was applied to the sample: all 
stars brighter than a faint magnitude limit were included, whilst all 
those fainter were excluded. The resultant catalogue is visualised in 
Fig. 1. As one would expect, the magnitude limit has had the effect 
of preferentially excluding more extinguished stars from the cata¬ 
logue. This is particularly apparent beyond ~ 5 kpc where almost 
no stars with above average extinction are observed. 

One crude, though apparently appealing, way to estimate the 
mean extinction along a line of sight is to place stars into bins based 
on their distance and then find the mean extinction of stars within 
each bin. Fig. 1 shows a mean distance extinction relationship es¬ 
timated in this manner. Clearly the result obtained is unsatisfac¬ 
tory: beyond ~ 2 kpc the estimate is biased to smaller extinctions 
as a consequence of the incompleteness of the catalogue. Further¬ 
more, the obtained mean distance extinction relationship unphysi¬ 
cally drops with increasing distance. That this approach should be 
so unsuccessful is unsurprising given that it has ignored the role the 
faint magnitude limit has on shaping the observed catalogue. 

There are several approaches to obtaining a better estimate 
of the true distance extinction relationship. Marshall et al. (2006), 
and subsequent studies based on its method, deal with the selection 
function by comparing observations to simulated data subject to a 
selection function that is assumed to be the same as that affecting 
the real data. As a result, assuming they understand their selection 
function, they largely sidestep the complications provided by selec¬ 
tion functions. However, this approach is rigidly bound to the use 
of a particular Galactic model and the quality of any produced 3D 
extinction map will depend on the quality of the Galactic model 
employed. 

An alternative approach, adopted by Vergely et al. (2010), Sale 
(2012), Green et al. (2014) and Sale & Magorrian (2014) is to build 
statistical models that estimate the posterior probabilities of vari¬ 
ous 3D extinction maps given some observed catalogue. So, they 
need to include selection effects in the statistical model in order to 
calculate probabilities correctly. In particular the selection function 
will affect the normalisation of the likelihood employed, since the 
range of observations accessible will be restricted by the selection 
function. We will return to this issue in section 6, where a Poisson 
point process based methodology will be used to infer an unbiased 
estimate of the distance extinction relationship in the presence of a 
selection function. Prior to that, the following sections will develop 
the mathematical framework required in section 6. 


absolute magnitudes, though note that this is not the case in reality (e.g. 
Sale & Magorrian 2015). 
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3 SELECTION FUNCTIONS 

In general we define a function S that determines the probability 
of a given object (potentially a star, galaxy, etc.) appearing in some 
catalogue. 

In all cases the selection function is conditionally indepen¬ 
dent^ of the true parameters of an object given the observations on 
which the selection has been based. This is trivially apparent in e.g. 
complete photometric catalogues that include all sources detected 
in an image. However, it is not necessarily clear that it is also true in 
other, more involved, cases. Consider a sample of galaxies within 
some redshift range compiled using photometric redshifts. Clearly 
the selection will not be marginally independent of the true red- 
shifts of the galaxies, but it is conditionally independent of them, 
given the photometry that has been used to derive the redshifts. 
Consequently, it’s possible to define a selection function that de¬ 
pends only on the selecting data and not on the physical parameters 
of the objects being studied. 

Photometric catalogues are fundamentally subject to bright 
and faint magnitude limits: the brightest stars saturate in images 
and so cannot easily be included in a reliable catalogue, whilst stars 
below some brightness will be too faint to detect above the noise 
from the sky background, etc. Normally this part of the selection 
function can be estimated for entire image fields by smoothing the 
selection function at the limits by the use of e.g. a sigmoid func¬ 
tion so as to account for the variable sky brightness, gain and so on 
across an image. Further selections are sometimes made on photo¬ 
metric catalogues, e.g. colour cuts intended to select samples dom¬ 
inated by certain types of object. 

While photometric surveys tend to shoot blind, imaging all 
sufficiently bright stars, spectroscopic surveys are defined by some 
input catalogue. Typically the input catalogue will be compiled us¬ 
ing data from some photometric survey. For example APOGEE em¬ 
ploys 2MASS (Zasowski et al. 2013). Consequently the selection 
function will depend on the photometry employed and will be con¬ 
ditionally independent of the spectra obtained. However, users will 
often then define subsamples to use, requiring the observed spec¬ 
tra to meet certain quality requirements and/or requiring measured 
quantities from the spectrum (e.g. [Ee/H]) to fall within some range. 
The selection function of this more restricted sample will then de¬ 
pend on both the photometry used to define the input catalogues 
and the observed spectra, including measurements derived from the 
spectra. 

Samples selected ‘by hand’ are generally unsuitable for use 
when studying a population, as it is often simply not possible to 
define the selection function. An exception to this is citizen science 
projects, where the large number of people involved means selec¬ 
tions made by individual humans can be treated statistically. 

A general assumption that is made when considering selection 
functions, that we will also make here, is to assume that the appear¬ 
ance of one object in a catalogue is independent of the appearance, 
or non-appearance, of any other objects. As a result, we will as¬ 
sume that it is possible to write down a selection function for each 
object that depends only on the observations of that object and not 
on those of any other objects in the catalogue. 


^ See appendix A for a reminder of the definitions of conditional and 
marginal independence. 


4 POINT PROCESSES 

A point process is a random process that produces a distribution 
of points within some space. Because a point process is a random 
process, each realisation of it will produce a distribution of points 
that is distinct to other realisations. Examples of their use cover a 
diverse variety of situations, from the distribution of trees within 
some region (Higgle 2003), to the frequency of calls to a telephone 
exchange (Erlang 1909). 

Central to our purposes here is that it is possible to consider 
a catalogue of astrophysical observations as a realisation of a point 
process in the space of observations, as we will discuss in section 5. 

There are a number of different types of point process. Here 
we will concentrate on one of the simplest: the Poisson point pro¬ 
cess. In this section a short summary of only the most salient fea¬ 
tures of Poisson point processes is provided. There are many text 
books available that provide a more detailed introduction to Pois¬ 
son point processes and/or point processes more generally, e.g. 
Kingman (1992), Moller & Waagepetersen (2004) and Illian et al. 
(2008). 


4.1 The Poisson point process 


We start by considering some region y of arbitrary dimensional 
space, described by the position y. Within this region we observe 
some distribution of points, where the points are indistinguishable 
except for their locations. Then the data © consist of the set of N 
locations {yi ,>'2> • • • where points are observed assuming no 
two locations are identical. 

We define an intensity function ^(yl©) which is a determin¬ 
istic function^ of the position y and a set of hyperparameters 0. 
This ^(yl©) defines how probable it is that a point will appear at a 
given y. Eor a given point process to be a Poisson point process we 
require that the number of events within some smaller region OJ of 
y is defined to be Poisson distributed with a mean of 


E[«(%)] = f Hy\@)dy 
JVs 


(3) 


SO that the likelihood follows 


P(«(%)|©) = ^ -^exp j^%{y\@)dy^ (4) 


We also require that the number of events within any two distinct 
regions of space are independent, conditioned upon X. So, for a set 
of disjoint regions {Cfj} = {91 ,%,••• ,9^}, 


M 

p(©l©,{(r,'}) = npW^>S')|0)- (5) 

/=! 

A point process that satisfies these two requirements is a Poisson 
point process. The likelihood for a Poisson point process is 

p(©|©) =exp ^-^/i(y|©)dyj ]^^>i(y„|©). (6) 

The exponential term in the likelihood is a normalisation, covering 
all possible number of points and all values in the space y for each 


^ If X is a stochastic, rather than deterministic function of the the hyper¬ 
parameters then one has a Cox process. Cox processes are similar to Pois¬ 
son point processes, but making inferences on models that employ them is 
somewhat complicated by their stochastic nature (e.g. Adams, Murray & 
MacKay 2009) 
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point. A short derivation that demonstrates that this is a normali¬ 
sation factor can be found in Appendix B. Critically, this normal¬ 
isation depends on the hyperparameters. So, if one seeks to infer 
the hyperparameters, this normalisation must be included through¬ 
out the analysis. In some situations the integral within the exponent 
can be found analytically, otherwise it is necessary to find a reliable 
numerical approximation of it. 

In many real-life situations we will be interested making some 
inference on the hyper parameters © that control the intensity func¬ 
tion A,(x|0). In order to do so we can invoke Bayes’ rule, so that, 
in the most simple case. 


p(0|®) 


p(g>|e)p(e) 

p(©) 

p(g|e)p(Q) 

/p(®|0)p(0)d0’ 


(7) 


where the use of e.g. Markov chain Monte-Carlo (MCMC) circum¬ 
vents the need to calculate the denominator. 


5 DESCRIBING CATALOGUES AS A REALISATION OE 
A POINT PROCESS 

In what follows we are going to model astronomical catalogues as 
a realisation of a Poisson point process. This is a treatment that has 
been explicitly adopted on a number of occasions in the past, in¬ 
cluding by Sarazin (1980), Loredo (2004), Gajjar, Joshi & Kramer 
(2012), Lombardi, Lada & Alves (2013), Tempel et al. (2014) and 
Foreman-Mackey, Hogg & Morton (2014). In addition countless 
other authors have implicitly adopted this approach, without not¬ 
ing or perhaps even realising that they were doing so. Here we 
will extend on these earlier works by considering the application 
to 3D extinction mapping including confronting the issue of selec¬ 
tion functions. 

In what follows we will assume that the space of observations 
includes the measured on sky position of the objects observed in 
addition to a range of other observations, potentially including ap¬ 
parent magnitudes, proper motions, parallaxes, spectra and quanti¬ 
ties derived from the spectra. A catalogue of data then consists of 
observations of these values for a list of objects. 

We make four key assumptions about the data in observed cat¬ 
alogues: 

(i) When conditioned on the underlying ‘true’ intensity func¬ 
tion, the probability of an object, whether or not it is observed, ex¬ 
isting at a point in observation space is conditionally independent 
of the appearance of other objects. 

(ii) The selection function does not depend on the properties of 
more than one object (see section 3). 

(hi) The number of objects observed within some region of ob¬ 
servation space should be Poisson distributed. 

(iv) No two observed objects can exist in the exact same location 
in observation space. 

If the observations are made on continuous scales and conditions (i) 
and (ii) are met, then condition (iii) is essentially always satisfied: 
the law of rare events acting on arbitrarily small intervals of space 
allows the Poisson distribution to be adopted. Meanwhile, the con¬ 
tinuity of the underlying space typically ensures that condition (iv) 
is met. 

When satisfied, these assumptions are sufficient to meet the 
two conditions described in section 4.1, specifically the number of 
objects observed in any region of observation space will be Poisson 


distributed and the number of objects in any two disjoint regions of 
observation space will be independent. Therefore, it is possible for 
an observed catalogue to be modelled as a realisation of a Poisson 
point process. 

A statistical model is employed to describe the probability of 
objects being observed at any given point in parameter space and 
thus defines the intensity function X. The statistical model will in¬ 
clude the selection function that acts on the data: regions favoured 
by the selection function will, in general, exhibit stronger intensi¬ 
ties than those that the selection function suggest we should have 
trouble observing. 

5.1 Catalogue Likelihood 

In section 4 we considered a point process intensity that was di¬ 
rectly defined on the space of observations 9^, producing obser¬ 
vations y. However, in astrophysics it is more common to define 
physical models that set the probability of objects with some phys¬ 
ical parameters x. x could include quantities such as distance, mass, 
metallicity and so on, and the space these physical parameters oc¬ 
cupy is defined as X. 

We essentially never have direct and absolutely precise mea¬ 
surements of any physical parameter, only some noisy observations 
that are in some way related to x. At best our observations may con¬ 
tain direct estimates of physical parameters, but more generally our 
observations will consist of some other quantities, such as apparent 
magnitudes, that depend on x in some potentially complicated way. 
The observations are related to the underlying physical parameters 
through a probability density function (pdf) p(y|x), that maps from 
X to 9^, describing how likely an object with physical parameters 
X is to produce observations y. 

We can consider an intensity function X'{x, 0) that determines 
how frequently objects with certain physical parameters x, e.g. dis¬ 
tance, mass and so on arise on a space of physical parameters X, 
given some hyperparameters 0. If we have a catalogue of stars, 
such an intensity function would include descriptions for the num¬ 
ber density of the stars in space, an initial mass function and a star 
formation history. This is then passed through p(y|x). Thus, if all 
objects were observed, we could define 

X(x,y|0) =p(y|x)A,'(x|0), (8) 

where ^(Xjyl©) is the intensity function on the joint space of both 
the observations and the physical parameters of the objects. 

However, not all objects in the universe appear in any cata¬ 
logue. Therefore, we introduce the selection function Ajy) that de¬ 
scribes the probability that an object with observations y will ap¬ 
pear in the catalogue. So, we therefore have 

X(x,y|0,5) =5(y)p(y|x))i'(x,0). (9) 

In many cases it is prudent to decompose the intensity function 
X'(x,@) into the product of a normalised pdf p(x) and some, pos¬ 
sibly unknown, scaling factor that defines the total number of 
objects (observed and unobserved). So 

X(x,y|?v;,0,5) = fA(;5(>-)p(y|x)p(x|0). (10) 

In some cases HXC may be known a priori, however, in general this 
will not be the case. Note that, as it is a hyperparameter, could 
be included in 0. However, for the sake of clarity, it is kept as a 
separate factor. 

Now that we have defined an intensity function we can return 
to the formalism discussed in section 4 and define the likelihood of 
some data using (6). 
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We consider a catalogue of data © that covers N objects, so 
that © = {yi,y 2 , ■ ■ ■ where y^ is the observations of the nth 
star. Note that if new data were obtained, for example by re-imaging 
a field to produce fresh photometry, then not only will the observa¬ 
tions of each object (i.e. y„) change, but the size of the catalogue, 
N, may well also change, due to a different number of objects be¬ 
ing visible in the new image. Additionally, we define a catalogue 
of physical parameters of the stars {x,} = {xi ,X 2 ,... ,XAf}. Follow¬ 
ing (6) and (10) we get 

p(©,{x,}|llV;,0,5) = exp J^J^dy'dj^X{x',y'\9l,@,S)^ 

N 

X A,(x„,y„|fiv;,0,5) 

n—1 

= exp liv;5(y)p(y|x')p(x'|0)j 

N 

X n (^^(y«)p(y«k«)p(jc«|0)), 

n=l 

( 11 ) 

the likelihood for a Poisson point process on the space {X,(F}, 
the space of observations and physical parameters. A key feature to 
notice is that our likelihood is conditioned on the selection function, 
which we must therefore know if we are to make any inference. 

Often we will not be interested in the physical parameters of 
each of the objects {x,} directly, focusing only on the hyperpa¬ 
rameters 0 that define the population. In that case we can opt to 
marginalise {x,}, simplifying to a Poisson point process on the 
space of observations only, 

p(©|fy,0,5) =exp J^dy'd:^0\CS{y')p{y'W)p{x'\&)^ 

X n (^■Siyn) [ ^^^P(3'«|J^)P(^|0)) • 

K=1 \ J X J 

( 12 ) 

Though, sometimes, it will be difficult to perform the marginal¬ 
isation directly. Therefore, it may be easier to consider the un¬ 
marginalised likelihood (11) and use a numerical scheme, such as 
MCMC to marginalise {x, }. 


5.1.1 Known size catalogues 


We now consider catalogues of an a priori known size. We are 
therefore considering a Poisson point process conditioned on the 
total number of observations. Such a point process is referred to as 
a binomial (point) process. 

Such catalogues are particularly typical of spectroscopic ob¬ 
servations; known size catalogues typically arise when an input 
catalogue has been employed to produce a list of objects to be ob¬ 
served. 

We can derive the likelihood for a fixed size catalogue by start¬ 
ing from that of an unknown sized catalogue, as given by (11) and 
conditioning on the total number of objects N in the catalogue. 


p(©,{x,-}|A,fy,0,5) 


p(©,{x,-},A|gv:,0,5) 

p{N\9i,®,S) 


(13) 


As N is simply the size of the data catalogue © and the set of physi¬ 
cal parameters, the numerator above is the likelihood given by (11). 
N itself follows a Poisson distribution, with an expected number of 


observed stars of 

E[iV|fy,0,5l = j^j^dJ^dy'9tS{y')p{y'\x')p{x'\@). (14) 

Therefore, we can say 

p{N\tsC,&,S) (^J^J^dx'dy'9>CS{y')p{y'\x')p{x'\&)^ 

X exp (^-J^J^dx'dy'9>CSiy')p{y'\x')p{j^\@)^ . 

(15) 


We can then divide (11) by (15) to obtain 


p(© |X,-||A 0 5) = A' n _ ^(yn)PiynMp{Xn\Q) _ 

p( ,1,11 ,0,5) ax'dys{y')p{y\x')p{x'\e)- 


(16) 


Here the product on the right hand side simply corresponds to that 
of N independent draws from a normalised probability distribution 
proportional to 5()'„)p(y„|x„)p(x„|0), that includes the selection 
function. Consequently it is far more intuitive than that for variable 
sized catalogues. More specifically, (16) is simply the multinomial 
distribution in the limit of an infinite number of categories, where 
the category occupancy is never higher than 1. 

This normalisation also makes sense in light of appendix B, 
it is simply that acquired by integrating over all of (X,(T), but for 
a fixed number of objects in the catalogue, as would be expected 
given this probability is conditioned on the size of the catalogue. 

This result is similar to that obtained by others who have 
worked with fixed size catalogues such as Sanders & Binney 
(2015). Note that the likelihood obtained is independent of tXC and 
so there is no need to infer or marginalise it out. 


5.2 Prior and posterior distributions 

Given that we seek to infer the hyperparameters, we now need to 
define the posterior probability distribution. For the case where the 
size of the catalogue is fixed a priori, following (16), the posterior 
is: 


p(0,{x,}|©,A,5) 


p(©,{x,}|A,0,5)p(0) 

p(©|A,5) 

^!p(Q) A -5(y„)p(y„|x„)p(x„|0) 

p{'D\N,S) H fxfydx'dys{y)p{y\x')p{x'\&) ■ 

(17) 


If the size of the catalogue is not fixed, as is the case with photo¬ 
metric catalogues, following (11), the posterior is 


p(0,7V;,{xi}|©,5) = 


p(©,{x,-}|gv;,0,5)p(0,gv;) 

P(©|5) 

exp (-1 l^dydx'9iS{y')p{y'\x')p{x' 


e 


JCf Jx 


n (.^■Siyn)P{ynW)p{Xn\®)) ■ 
(18) 


We now need to place a prior on 0 and Often one will assume 
that the priors on 0 and are independent, so that 

p(0,5V;)=p(0)p(lAO. (19) 
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The choice of p(0) will very much depend on the situation being 
studied. In some cases a sensible p(lA^) will also be implied. If not, 
three natural choices for p(!A^) are: a uniform prior 

vm = k, ( 20 ) 

where k is a constant; Jeffreys prior, 

p(iA0 = rAC-!; (21) 


and a gamma distribution prior, which is the conjugate prior for the 
Poisson distribution. 


p(5v;) 


pa 


5y;a-le-p5V:. 


( 22 ) 


Notice that the uniform and Jeffreys priors are approached as spe¬ 
cial cases of the gamma distribution when p —>■ 0 , and a = 1 or 1/2 
respectively. Additionally, both the uniform and Jeffreys priors are 
improper, that is their integrals over all lA/ are infinite. Improper 
priors often lead to improper posteriors and so should generally be 
avoided if possible. Consequently, from this selection, the use of 
the gamma distribution prior, with P 7 ^ 0 , should be preferred. 

Once armed with a fully defined posterior one can make some 
inference about the hyperparameters. In general, the posteriors 
given by (17) or (18) will not be analytic and so one will have to em¬ 
ploy some numerical means, such as importance sampling, MCMC 
or sequential Monte Carlo (Doucet, De Freitas & Gordon 2001) to 
sample from the posterior. In the case of a priori unknown sized 
catalogues, as given by (18), one has the choice of either inferring 
as it may be a parameter of interest, or marginalising over it. 
Providing p(lA/) follows a gamma distribution, we can straightfor¬ 
wardly perform the marginalisation to obtain 

p(@,{xi}\'D,S) = J d0\Cp{®,^,{xi}\'DyS) 

r r \ 

P + J^dy'dx'S(y')p(y'\x')p{x'\e)j 



X 


p“r(a+A^) p( 0 ) 

r(a) pms) 


N 


n 

n=l 


(-5(y«)p()'«k«)p(jCn|0))- 


(23) 


The marginalisation for the uniform and Jeffreys priors can be re¬ 
covered by simply considering the limiting behaviour when P —> 0 , 
and a = 1 or 1/2 respectively, as discussed above. Also, as p —>■ 0 
and a —> 0 (23) approaches the form of (17), the posterior for a 
fixed size catalogue. However, as the prior approaches any of these 
three special cases the evidence p(©|5,p(lA/)) tends to 0. 


5.3 Example: Malmquist bias 

To initially illustrate the benefits of the point process based method¬ 
ology described in the previous sections we will consider a simpli¬ 
fied example, motivated by Malmquist & Hufnagel (1933). In this 
example, we have measured the distances to a catalogue of stars 
without error and now seek to estimate the distribution of these 
objects in space. However, the catalogue has been subject to a faint 
magnitude limit and so is an incomplete and biased sample from the 
wider population. The finer details of this example are discussed in 
appendix C. 

Given the catalogue, we now seek to infer the scale length of 
the population. The most straightforward way to approach this is 
simply to count the number of stars in a distance bin and divide by 
the volume to obtain an estimate of the density, as shown in Fig 2. 



Figure 2. Representations of the density of stars as a function of distance 
for the catalogue of stars represented in Fig. Cl. The black line shows the 
true density, whilst the histogram shows the number of stars in the cata¬ 
logue. The dashed red line shows the posterior expectation, with the shaded 
area representing a I-a credible interval. Note that the posterior expecta¬ 
tion is in near exact agreement with the true density, so that the two lines 
are essentially overplotted. 



L/kpc 


Figure 3. The posterior distribution on scale length L and total number of 
stars 5\/. The true values are L = 3 kpc and = 1000. 


However, as expected, this approach strongly underestimates the 
scale length; as the distance increases the catalogue becomes pro¬ 
gressively more incomplete, underestimating the density by more 
an order of magnitude beyond ~ 8 kpc. The scale length estimated 
directly from these binned counts is ~ 1.5 kpc, roughly half the true 
value. 

Therefore, we apply the Poisson point process based method¬ 
ology discussed in previous sections. Using this approach we ob¬ 
tained a posterior distribution on the scale length and total num¬ 
ber of stars (Fig. 3) that agrees well with the ‘true’ values used to 
simulate the data. The density distribution implied by the posterior 
expectation values of both parameters is shown in Fig. 2, agree¬ 
ing well with the ‘true’ density distribution, in contrast to the naive 
approach. 


6 EXTINCTION MAPPING 

We will now return to the issue of extinction mapping and con¬ 
sider how to apply the Poisson point process based methodology 
described in the previous sections to a somewhat more realistic ex¬ 
ample than that described in section 2. This example is designed 
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to largely mimic the methods employed by Sale (2012) and Green 
et al. (2014), but a number of assumptions will be made so as to 
reduce the complexity somewhat. 


6.1 Catalogue simulation 

A sample of 10000 stars was simulated, with the spatial distribution 
following that described in section C and the extinction distribution 
the same as in section 2, but for Ar- In the name of simplicity all 
stars were assumed to exhibit solar abundances, lie on the zero age 
main sequence (ZAMS) as defined by Marigo et al. (2008) and be 
in single star systems (i.e. no binaries). The mass of each star was 
assumed to be drawn from a power law IMF with slope -2.3, trun¬ 
cated to the mass range given by the Marigo et al. (2008) ZAMS 
isochrone. In addition, the ratio of extinction between each of the 
bands was assumed to be constant^, set by the values for an AOV 
star in the limit of small extinction subject to a Fitzpatrick (2004) 
"Ry = 3. r reddening law. Consequently each star is described by a 
distance, extinction and mass only, so that for the nth star the fun¬ 
damental parameters are given by = {s„,Arn,tMn}, the distance, 
extinction to and mass of the star respectively. 

Given the mass, absolute magnitudes Mx in each of the SDSS 
bands were taken from the Marigo et al. (2008) ZAMS isochrone. 
Then apparent magnitudes in each band were simulated following 

mx =A/x+Ar^+51ogio(VlOpc) (24) 

Ar 

( iffy- _ '\ 

ln(0.1)-|-2^^) 

mx ~ 9i{mx,Ox{mx)), (26) 


where mx is the expected apparent magnitude of the star in band X, 
mx the ‘observed’ apparent magnitude in that band, the mag¬ 
nitude limit in this band and represents the normal distribution. 
The definition of CTx(mx) includes an additional factor of 0.02 to 
simulate systematic uncertainties, such as zero point error. It has 
has been defined assuming that catalogue has a ‘lO-o’ magnitude 
limit. 

Then a simple magnitude limit is applied. For each star the 
apparent magnitudes in each band are included in the catalogue if 
they are brighter than the faint magnitude limit. If a star is too faint 
in all bands then it does not appear in the catalogue. The limiting 
magnitudes were set to 


n^ = 22 

(27) 

m^ff = 22 

(28) 

m'ff = 22 

(29) 

= 21 

(30) 

m^ff = 20.5, 

(31) 


roughly the typical values for SDSS. Therefore, the selection func¬ 
tion in band X is 


5*^' {mx) 


1 if mx ^ 

0 otherwise. 


(32) 


Note that in reality catalogues tend to become incomplete gradually 
and a sigmoid function would be a more appropriate than the step 
function used here (e.g. Green et al. 2014). 


Note again that this is not true in practice (e.g. Sale & Magorrian 2015) 



Figure 4. A compaiison of a mean distance-extinction relationship found by 
various different approaches. The true distance extinction relationship (as 
used to simulate the data) is shown with a black line. The red line indicates 
the relationship found by a Sale (2012) based algorithm with no treatment 
for the selection effect, whilst the blue line shows the result obtained if a 
Poisson point process based treatment for the selection effects is included, 
with a gamma distribution prior on lAf. The green line indicates the result 
obtained by an algorithm based on the description given in Green et al. 
(2014). 


The observations for each star then comprise of apparent 
magnitudes in the u, g, r, i and z bands and the accompanying un¬ 
certainties given by the <Jx from (25), where observations for one 
or more bands may be missing. Collectively the observations for 
all these stars form a catalogue © = {yp,. --yxi}- From the original 
10000 stars simulated typically only ~ 1000 were bright enough to 
appear in the simulated catalogue. 

The 3D mapping approaches of e.g. Sale (2012) and Green 
et al. (2014) break the sky down into many small pixels or ‘reso¬ 
lution elements’ which are then treated independently. The size of 
the catalogues simulated here is not atypical of those for each reso¬ 
lution element in Sale et al. (2014) and so the example we consider 
is a reasonable approximation to real 3D extinction mapping. 


6.2 Poisson point process based inference 

The point process methodology was again used to determine the 
characteristics of the population that © describe, specifically the 
dependence of extinction on distance. The combination of the like¬ 
lihood and selection function is 




n exp 


n., / 


{mx-mx{Xn)Y 


I -s''^\mx{xn)) 

{mx,o,)^y„-' V ^2%<s\{mx{Xn)) 

( {mx -mx{Xn))^\ , 

“P- -,^2/- / dmx{x„ 

\ 2ojdmx{x„)) j 


2ct|. 


(33) 


where mx{x„) is the expected apparent magnitude of a star in band 
X given some stellar parameters x„. The first line in this equation 
takes the product over those photometric bands for which obser¬ 
vations exist, while the second takes the product over those bands 
without observations and represents the likelihood of not getting 
some observed photometry in that band given stellar parameters 
x„. 
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Figure 5. A demonstration of the impact of raising the magnitude limit on 
the obtained distance-extinction relationship. The true distance extinction 
relationship (as used to simulate the data) is shown with a black line. The red 
lines show the relationships found using a Sale (2012) based algorithm with 
no treatment for the selection effect, whilst the blue line shows the result 
obtained if a Poisson point process based treatment for the selection effects 
is included. In both cases the solid lines indicate a r band faint magnitude 
limit of mj."" = 22, dashed lines indicate a limit of mj.'™ = 21 and dotted 
lines mj.™ = 20. 


For the purposes of this test we assume that the way stars 
are distributed in space, i.e. (C2), and the IMF are known. Con¬ 
sequently the only hyperparameters are those that trace how ex¬ 
tinction varies with distance. Here the relationship between mean 
extinction and distance is parametrized with a piece-wise constant 
function, as in Sale (2012) and Green et al. (2014). The length 
of each piece was set to 200 pc, in effect creating a series of 
200 pc ‘bins’. The standard deviation of extinction as a function 
of distance is described similarly. Therefore, extinction would be 
described by a set of hyperparameters that are the mean (0) and 
standard deviation (Q of extinction in a series of distance bins, 
© = {01,^1,...,e„,C«}-So, 


^(x\Q) = L‘Jt{Ar\Qj,^j) 


kpc) 

jdM' 2(3 kpc)3 ’ 


(34) 


where is the lognormal pdf defined by a mean and stan¬ 

dard deviation. 

Collectively (33) and (34) define the intensity function, as 
in (10), 


?i(x,y|0) = 5V;5(y)p(y|:<:)p(:<:|0), (35) 

where, as before, represents the total number of stars that could 
be included in the catalogue. This intensity function can then be 
used to define the likelihood following (11). 

In order to define a posterior distribution a prior distribution 
p(0,lA(() must be defined. We required that mean extinction must 
be positive and increase with distance and that the standard devia¬ 
tion of extinction must always be positive by setting the prior prob¬ 
ability to zero if any of these conditions are broken. Otherwise the 
prior on 0 was assumed to be flat and independent of 5V(. We will 
consider three possible priors on 9\[j. a uniform prior, Jeffreys prior 
and a gamma distribution prior, as in section 5.2. The gamma distri¬ 
bution prior assumed a mean value of 10000, with 25% uncertainty, 
so that a = 16 and (3 = 1/625. 

As before, the simulated data are meant to represent photome¬ 
try and so the the number of observed stars in the catalogue was not 
fixed a priori. Consequently, the approach discussed in section 5.2 
applies and was marginalised following (23). 


A Metropolis within Gibbs MCMC scheme was employed 
to sample from the posterior (Tierney 1994): each jc„ was up¬ 
dated in turn, then © was updated in one go within each iteration 
of the MCMC algorithm. The integral in the normalisation term 
in (23) was estimated numerically by passing the isochrone em¬ 
ployed through the selection function at a range of distances along 
the line of sight. 

Fig. 4 compares the posterior expectations of mean extinction 
against distance found as described above to the true relationship. 
As is readily apparent the result obtained without any treatment 
for the selection function performs poorly, underestimating extinc¬ 
tion significantly beyond 2 kpc. The reason for this is, as described 
in section 2, the observed catalogue of stars is deficient in more 
extinguished stars at larger distances, since they are too faint to 
observe. By contrast, the posterior expectation found employing 
the Poisson point process methodology and a gamma distribution 
prior performs excellently, obtaining an accurate estimation of the 
true distance extinction relationship at all distances. This approach 
is essentially that adopted by Sale et al. (2014), though simplified 
slightly to match assumptions made when simulating the data. 

In Fig. 5 we consider the impact of raising the magnitude limit 
used to compile the catalogue, requiring that stars pass faint limits 
in the r band of mj.™ = 22,21,20. As the magnitude limit is raised, 
the results obtained using the method of Sale (2012) become pro¬ 
gressively more inaccurate and depart from the true distance ex¬ 
tinction relationship more locally as more distant and/or more ex¬ 
tinguished stars drop out of the catalogue. In contrast, the result 
obtained with the Poisson point process methodology continues to 
match the true distance extinction relationship, though the precision 
of the result decreases somewhat due to the catalogue reducing in 
size and so providing less information. 

Not shown in Fig. 4 are results found using the Poisson point 
process methodology and a flat or Jeffreys prior on ?\/, as given by 
(20) and (21). These largely match the result using the gamma func¬ 
tion prior out to ~ 5 kpc at which point they unphysically diverge 
towards very large extinctions. 

This behaviour essentially stems from the fact that both the 
uniform and Jeffreys priors are improper. One consequence of em¬ 
ploying an improper prior is that the resulting posteriors are also 
improper. MCMC algorithms are unable to converge when employ¬ 
ing an improper posterior and indeed closer examination reveals 
that in both cases the MCMC chains have failed to converge, with 
extinctions at large distances increasing to ever larger values as the 
chains progress. In addition, the posteriors following from either of 
these priors will assign infinite probability mass to infinite values 
of lA/, which in turn allows infinite extinctions at distances beyond 
any observations. Therefore, neither the uniform nor Jeffreys prior 
on 5\/ should be used, and so, of the initial three options considered, 
only the gamma distribution prior produces realistic results. 

Fig. 4 also displays a result based on the method described 
by Green et al. (2014), though again featuring simplifications to 
match the simulations. In this instance this method tends to under¬ 
estimate extinction, both at large distances, where the catalogue is 
more strongly biased towards smaller extinctions, and more locally. 
The formulation of Green et al. (2014), though similar in spirit, ex¬ 
hibits some fundamental differences to the approach described in 
Sale (2012) and Sale et al. (2014). The most obvious difference 
is that they assume that differential extinction^ within any 6' x 6' 


^ Here differential extinction is defined as the variation in extinction with 
respect to changing Galactic coordinates at some fixed distance. 
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Figure 6. The mean value of R 5495 measured from stars appearing in an 
observed catalogue as a function of distance. The different lines indicate 
different values of applied, from bottom to top: A;. = 0.1 (black), A^ = 0.5 
(red), Aj = 1.0 (green). A;. = 1.5 (blue), A^ = 2.0 (magenta). 


region of the sky is negligible. However, Sale et al. (2014) demon¬ 
strates that this is not the case. As a result of excluding small scale 
differential extinction, the Green et al. (2014) method tends to pass 
very close to the posterior expectations of distance and extinction 
for the brightest stars, even when in reality these stars exhibit ex¬ 
tinctions that are significantly above or below the mean value for 
their distance. In this case, two bright and relatively unextinguished 
stars drag the Green et al. (2014) estimate towards an underestimate 
of extinction around 2 kpc, whilst a bright star at ~ 5 kpc pulls the 
estimated mean extinction towards the true value. However, it is 
unclear how, more generally, this effect will influence their results 
and how it will interact with their treatment for the survey selection 
and incompleteness. 

Beyond their treatment of differential extinction, Green et al. 
(2014) present a description of extinction is essentially the same 
as Sale (2012): a hierarchical model under which one has some 
(imperfect) observations of stars that sample a distance extinction 
relationship. However, there are some subtle, but fundamental, dif¬ 
ferences between the posterior distribution described in Green et al. 
(2014) and that described here, associated with the treatment of 
the selection function. A detailed discussion of the methodology of 
Green et al. (2014), examining how these discrepancies arise can 
be found in appendix D. 


7 EXTINCTION LAW VARIATIONS 

In the previous section we examined the role selection effects can 
have on determining the mean extinction. In addition, one might 
also be interested in the wavelength dependence of the opacity of 
the ISM, i.e. the shape of the extinction law, typically parametrized 
by Rv- For example, a number of studies have found that the extinc¬ 
tion law towards the Galactic centre differs significantly from the 
assumed "Ry = 3.1’ average (e.g. Udalski 2003; Nishiyama et al. 
2009; Natafetal. 2013). 

In light of the profound impact selection effects can have on 
measurement of extinction, one wonders whether measurements of 
the shape of the extinction law could be similarly afflicted. To in¬ 
vestigate this we briefly consider a very simple example. Stars were 
distributed in space, assigned masses and absolute magnitudes as 
section 6 , whilst extinction in the z band was set to a constant value 
for all space. Five different values of Aj. were considered, from 
Aj. = 0.1 to Aj. = 2.0, where Aj ~ 2.2Ay when 1^5495 = 3.1. The 


shape of the extinction law employed was parametrized by 1 ^ 5495 , a 
monochromatic measure that is roughly equivalent to Ry- The value 
of 1^5495 for each star was drawn from a Gaussian distribution with 
a mean of 3.1 and a standard deviation of 0.3, following Fitzpatrick 
& Massa (2007). The extinction in each of the SDSS bands, given 
the values of 1 ^ 5495 , A^ and the mass of the star were then found us¬ 
ing the coefficients from Sale & Magorrian (2015). Subsequently, 
apparent magnitudes, including the imposition of some error, were 
found as in section 6 , with stars brighter than the magnitude limits 
in all five bands included in the final catalogue. 

Then the stars in the catalogue were placed in bins of equal 
distance modulus width (using their true distance modulus) and the 
mean value of 1^5495 found for each bin. The results are plotted in 
Fig. 6 . When there is little extinction the retrieved mean values of 
1^5495 unsurprisingly match the true value. However, as the level 
of extinction is increased a systematic and distance dependent bias 
towards larger values of 1^5495 begins to appear. The observed bias 
appears as smaller values of 1^5495 result in more extinction in the u 
band, for some A-, therefore giving the star a fainter u band appar¬ 
ent magnitude and so making it less likely to appear in the observed 
catalogue. 

In light of this, it is apparent that caution should be exercised 
when inferring some spatial variation in the wavelength depen¬ 
dence of extinction, since systematic variations may appear as the 
result of a selection effect induced by the magnitude limits of the 
data employed. Though it is worth noting that the observed effect 
is quite small in the presence of moderate extinction and so may 
be easily be overwhelmed by stochastic noise in small samples. As 
with 3D extinction mapping, Poisson point processes can be used 
to enable the correct construction and normalisation of a likelihood 
function that accounts for the selection function, thus facilitating 
accurate inference of any extinction law variation. 


8 CLOSING DISCUSSION 

This paper has described how catalogues can be modelled as a re¬ 
alisation of a Poisson point process. A significant benefit of this 
approach is that it allows the effects of selection functions to be 
modelled trivially, so that unbiased estimates of the parameters of 
astrophysical populations can be easily obtained. Some of what 
has been described above is intuitive and indeed some of the re¬ 
sults have been derived by authors who have not invoked Poisson 
point processes. However, by employing Poisson point processes 
not only is it easier to construct likelihoods for a catalogue of ob¬ 
servations, but it also becomes more straightforward to draw on 
the significant statistics and machine learning literatures that cover 
point processes. 

Importantly, the use of a Poisson point process makes clear 
how to correctly normalise the likelihood of obtaining some cata¬ 
logue of observations, when intuition alone might fail. There are 
two distinct possibilities as to how the likelihood should be nor¬ 
malised, if the number of objects in the catalogue is fixed a priori 
then the likelihood must only be normalised over the space of ob¬ 
servations and fundamental parameters for each object, as in sec¬ 
tion 5.1.1. If, however, the number of objects in the catalogue is not 
fixed then the normalisation must cover both all possible numbers 
of objects as well as the space of observations and fundamental pa¬ 
rameters for each object. Appendix B demonstrates that this is the 
nature of the normalisation term that appears in the Poisson point 
process likelihoods such as ( 6 ) and (11). Crucially, the normali¬ 
sation of the likelihood typically depends on the hyperparameters. 
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i.e. the characteristics of the population that the observed objects 
are drawn from. Consequently, any approach that either ignores or 
incorrectly treats the normalisation of the likelihood will obtain bi¬ 
ased/incorrect inferences about the population. 

Specific focus has been given to the issue of mapping extinc¬ 
tion in three dimensions. This is an instance where the impact of 
selection functions can be particularly worrisome, as more extin¬ 
guished stars will typically be fainter and so less likely to appear 
in any catalogue of observations. In section 6 it was shown how a 
Poisson point process based likelihood can enable the accurate es¬ 
timation of extinction as a function of distance. This Poisson point 
process based approach is essentially that employed by Sale et al. 
(2014) to obtain unbiased 3D maps of extinction. Section 6 also 
examines in detail how important it is to correctly normalise the 
likelihoods employed, looking at the results of other formulations. 

In the previous sections we have been assuming that the se¬ 
lection function S is well known; all of the posterior distributions 
displayed above are conditioned on S- However, in reality one will 
not have perfect knowledge of S- Consequently, there are two fun¬ 
damental options, either extend the posterior to also infer S or 
marginalise over it. In many cases prior knowledge of the selection 
function will be significant, such that marginalising over it will not 
introduce much extra uncertainty. However, in some cases, includ¬ 
ing confusion limited catalogues, S might not be well known and 
so will be an important source of uncertainty. 

In section 5 four assumptions were listed that must be met on 
order for a catalogue to be described as a realisation of a Poisson 
point process. In the examples described in this paper all of these 
assumptions are met. However, there are other instances where this 
will not be the case. Specifically, the method of Sale & Magorrian 
(2014) fails to meet condition (i): as the extinction to the stars in 
the catalogue is modelled using a Gaussian random field, the ex¬ 
tinction to two stars is explicitly covariant and so the assumption 
of independence is no longer valid. An extension to the approach 
described here, that invokes Markov point processes and can then 
be used with the method of Sale & Magorrian (2014), will be the 
subject of a future paper (Sale in prep.). 
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APPENDIX A: MARGINAL AND CONDITIONAL 
INDEPENDENCE 

If A and B are conditionally independent then 

p(A,S|C)=p(A|C)p(S|C) (Al) 

p(A|S,C)=p(A|C) (A2) 

p(B|A,C)=p(S|C) (A3) 


On the other hand, if A and B are marginally independent then 


p(A,S) =p(A)p(B) 

p(A|S)=p(A) 

p(S|A)=p(S) 


(A4) 

(A5) 

(A6) 


This state is also often referred to simply as independence, though 
the use of the term marginal independence is preferred here to re¬ 
duce ambiguity. The origin of the nomenclature is more obvious if 
we consider that the marginal independence of A and B also im¬ 
plies, e.g. 


p(A|S) = I p{A,C\B)dC 

= P(^), 


(A7) 


where 

p(A,C|S)7^p(A,C). (A8 ) 

Note that marginal independence does not imply conditional inde¬ 
pendence, nor does conditional independence imply marginal inde¬ 
pendence. 


APPENDIX B: NORMALISATION EACTOR Z(0) 

The exponential term in (6) is simply a normalisation factor, cover¬ 
ing all possible number of observed points, from 0 to oo, as well as 
the entire space (T for each point. In order to demonstrate this, we 
start by defining the likelihood, 

(Bi) 

«=0 


We can begin constructing the normalisation Z(0) by considering 
the normalisation for a fixed number of points n 


Z(0|n) = - 

n\ J<y 




(B2) 


where the factorial appears as we can not discriminate between dif¬ 
ferent orderings of the points. Then if we consider all values of n 


Z(0)=£4/'"/ f{Kyn\®)dy\---dyn 


(B3) 


where this is the normalisation factor in (6). This short demonstra¬ 
tion follows from e.g. Moller & Waagepetersen (2004). 



Figure Cl. The catalogue represented in distance-absolute magnitude 
space. Crosses represent stars that appear in the catalogue, points represent 
those that are too faint and so are not included in the catalogue. 


APPENDIX C: DETAILS OF MALMQUIST BIAS 
EXAMPLE 


In section 5.3 an example that demonstrated the use of Poisson 
point process based inference of the spatial distribution of stars in 
the presence of Malmquist bias was briefly discussed. We will now 
discuss more details of this example. 

The catalogue studied consists of a distance and a V -band ap¬ 
parent magnitude for each star and was simulated subject to a short 
list of assumptions. The number density of the observed stars is 
assumed to decline exponentially with distance, with the rate of 
decline set by a scale length of 3 kpc. The stars all exist on a 
square region of the sky, of solid angle fl, as defined by e.g. RA 
and DEC. The luminosity function of the stars is assumed to be 
uniform across the range 0 ^ E < 5 and the absolute magnitude of 
the stars is assumed to be marginally independent of their distance, 
with 


p(Mv') 


I if Os; My <5; 
0 otherwise. 


(Cl) 


at all distances. The catalogue is subject to a step function faint 
magnitude limit at E = 15: all objects with apparent magnitudes 
brighter than this appear in the catalogue, whilst those that are 
fainter do not. Finally we assume the distance and absolute magni¬ 
tude of each star are measured with absolute precision and effects 
such as interstellar extinction, binarity and so on are not included. 
The simulated catalogue Dm is shown in Fig. Cl. 

Given Dm we now seek to infer the scale length of the pop¬ 
ulation utilising the Poisson point process based methodology. We 
can first define the probability distribution for the distance s„ to the 
nth star as 

-s„/L 

p(5„|L) = At_^, (C2) 

where L is the scale length we wish to infer and the factor of 
arises from the Jacobian converting from Cartesian to spherical po¬ 
lar coordinates. The selection function is 


5(E) 


1 ifEs;i5; 
0 otherwise. 


(C3) 


Here the apparent magnitude E is defined normally, i.e. 


E = My-|-51ogio(s/10 pc). 


(C4) 
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Collectively (Cl), (C2) and (C3) define the intensity function, so 
that 


1{s,Mv\!K,L,S) 


5yj2g-s„/L 

m? 

0 


if Vs; 15, Os; My <5; 
otherwise. 


(C5) 


where has again been introduced to indicate the total number of 
stars, both those in the catalogue and those that fail to make the cut. 

As the number of stars in the catalogue is not known a pri¬ 
ori, but depends on the realisation of the underlying Poisson point 
process, we follow the treatment discussed in section 5.2. If we as¬ 
sume uniform and independent priors on both the scale length and 
the total number of stars, we have the posterior distribution 


p{9i,L\'DM,S) ocexp(-y JdsdMvX{s,Mv\9{_,L,S)) 

N 

n=l 


(C6) 


A simple Metropolis MCMC sampler was employed to explore this 
posterior. 


APPENDIX D: THE METHODOLOGY OE GREEN ET AL. 
(2014) 

We will now consider in detail why the method of Green et al. 
(2014) struggles to obtain an accurate estimate of the true extinc¬ 
tion as a function of distance, as shown in Fig. 4. Although we 
will not discuss it in detail, the simple Sale (2012) based result also 
performs poorly, owing to the fact that it largely fails to present a 
proper treatment for the selection function. 

In addition to neglecting differential extinction there are a 
two key differences between the posterior distributions used by 
Green et al. (2014) and those described here and applied in Sale 
et al. (2014). These specifically focus on how the likelihood is nor¬ 
malised. Using the notation adopted here, their likelihood is, as 
given by their equation (5), 

p(®i0)=np(>'«i0)- (Di) 

n=l 

By comparison to (16) we can see that, if p(>’„|0) is properly nor¬ 
malised, the right hand side would be the correct likelihood if and 
only if the number of stars to be observed was fixed. However, in 
that case, the likelihood would be conditioned on N so that the left 
hand side would instead read p(2)|A;,0). Green et al. (2014) ap¬ 
pear to intend that their method is applied to photometry, in which 
case the number of stars in the observed catalogue is a priori un¬ 
known and so they require a likelihood not conditioned on the size 
of the catalogue. In that case they do not have the correct normalisa¬ 
tion as they have not performed the normalisation over the a priori 
unknown number of stars in the catalogue, that gives rise to the 
exponential term in (6) that is missing in their formulation (see ap¬ 
pendix B). Though, this problem need not be completely fatal. As 
discussed in section 5.2, a judicious choice of p(5V]), albeit a p(lA(]) 
that is a posteriori unlikely, can enable one to arrive at a posterior 
distribution similar to what Green et al. (2014) reach. 

However, it is also apparent that their definition of p(}’„|0), or 
in their notation p(fn|a, S), is itself not correctly normalised, since, 
as can be seen in (16), the normalisation factor should depend on 
the hyperparameters. The source of this error in the normalisation 
is also relatively subtle. Adopting their notation. Green et al. (2014) 



Figure Dl. A directed acyclic graph that illustrates the statistical model 
adopted by Green et al. (2014). 


essentially write that, for each star 

p(»i|a) = JJJ diJdEd&p{m,ij,E,&\a) 

= JJJ dijdEd&p{m\iJ,E,&)p{iJ,E,&\a) (D2) 

= JJJ diJdEd@p{m\/j,E,e)p{E\iJ,a)p{/j,@). 

Where m represent the observed apparent magnitudes of the star, /r 
its distance modulus, E the extinction to the star, 0 the star’s spec¬ 
tral type and a are the hyperparameters that control the distance 
extinction relationship. p(ii|/j,a) takes the form of a 8-function, 
non-zero at an extinction defined by a for each fj, thus making it 
straightforward to integrate over E. They then note that the above 
integrand is proportional to the posterior p(/j,£,0|m,flat), that is 
derived when placing a flat prior on E. So that 

p(m|a) =p(m|flat) JJJ dpdEd@p{p,E,Q\m,fia.t)^J^^. 

(D3) 

They then go on to introduce S, a vector that indicates whether 
the star has been observed in each photometric band, and define 
the posterior distribution p{iJ,E,@\m,S, flat) in their equations (34) 
to (40). They then assert that it is possible to simply employ this in 
place of p{iJ,E, 0|m,flat), as in , in order to find p(in|a,S), but 

p(m|a,S) /p(m|S,flat) JJJ d^dEd0p(;U,E,0|fn, S, flat) ■ 

(D4) 

The derivation in (D2) and (D3) relied on the fact that 

p(/r,£,0|a) =p(£|;t/,a)p(;U,0). (D5) 

However, this does not carry over when conditioning on the star 
being observed (or not) in each band, i.e. 

p(/r,£:,0|a,S) 5^p(£|A/,a,5)p(/r,0|5). (D6) 

The reason for this can be seen by considering a graph of the model 
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they propose (Fig. Dl) and the rules of d-separation (Pearl 1988; 
Geiger, Verma & Pearl 1990). Specifically, in order for E to be 
independent of 0, the apparent magnitudes m and its children must 
not appear in the conditioning set. In (D5) this requirement is met. 
However, once conditioned upon S, as in (D6), this is no longer the 
case, since S is a child of m. 

Fortunately, it is possible to remedy this problem. A sensible 
starting point for considering conditioning the likelihood on the fact 
that the data have been observed is to note that 


p(fn|a,S) 


p{m,S\a) 

p(S|a) 


(D7) 


where 


p{m,S\a) = JJJ d^jdEd&p{m,S\^t,E,@)p{E\^i,a)p{iJ,&) 

= p(m,S|flat) JJJ d,udEd0p(/r,E,0|m,5,flat) ^^^^^p“^^ , 

(D8) 

where p(/J,£, 0|m,5,flat) is given by their (36). Meanwhile, 


p(S|a) = JJJJ diJdEd&dmp{S\m)p{m,S\iJ,E,&)p{E\ij,a). 

(D9) 

This is the average probability of any given star being observed 
subject to some distance extinction relationship as parametrized by 
OL. If extinction quickly rises to large values this probability will be 
small, whilst if extinction is small at all distances this probability 
will be much greater. As a consequence, the effect of this term will 
be to push the inferred distance-extinction relationship towards val¬ 
ues the selection function might normally prevent it reaching. So, 
for example, it will account for the more extinguished stars that 
would have been observed if only they were not too faint. 
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